13 + 29[1] 42
After completing this tutorial you should be able to
R script.functions and modify the default options using their arguments.vector is and distinguish between the main data types.vector.data.frames and vectors relate.ObjectsYou can think of the R console as a super-powerful calculator. You can get output from R by simply typing math directly into the console.
13 + 29[1] 42
or
546 / 13[1] 42
Well that’s fun - but not super helpful in our context.
In the R programming language, an object is a fundamental concept used to store, manipulate, and represent data. Everything in R is treated as an object, whether it’s a number (numeric), a text string (character), a dataset (data.frame), or even more complex data structures.
Objects in R can be created, modified, and used to perform various operations. Objects are assigned names that you can then use to reference them in your code. When you create an object, you’re essentially creating a container that holds a value or data.
Creating an object is straightforward. First, we give it a name, then we use the assignment operator to assign it a value.
The assignment operator (<-) assigns the value on the right of the <- to the object on the left1.
1 Start building good habits starting now in terms of your coding style. For example, your code is a lot more readable if you use white space to your advantage. For example, make sure you have a space before and after your <-
fork_length_mm <- 344Type that into the console and execute the command using Enter. If you look at your Global Environment (bottom left panel) you should now see forklength and the value you assigned it.
Notice, how when you assigned a value to your new object nothing was printed in the console compared to when you were typing in math.
To print the value of an object you can type the name of the object into the console.
# print value
fork_length_mm[1] 344
Now that fork_length_mm is in your environment we can use it to compute instead of the value itself.
For example, we might need to convert our fork length from millimeters (mm) to centimeters (cm).
fork_length_mm / 10 [1] 34.4
We can change the value of an object any time by assigning it a new one. Changing the value of one object does not change the values of other objects.
fork_length_mm <- 567Some initial thoughts on naming things2
2 You will soon discover that coding is 90% naming things.
Theoretically, we can name objects anything we want - but before that gets out of hand let’s think about some guidelines for naming objects.
fork_length is not the same as Fork_Length..) in names. Typically dots are used in function names and also have special meaning (methods) in R.if, else, for) and cannot be used as names for objects; in general avoid using names that have already been used by other function names.Using a consistent style for naming your objects is part of adopting a consistent styling of your code; this includes things like spacing, how you name objects, and upper/lower case. Clean, consistent code will make following your code a lot easier for yourself and others3.
3 Remember, future you is your most important collaborator.
One of the criteria for your homework assignments and skills tests is your code style. Next to imitating the style of coding presented in this manual, you can refer to r4ds (2e) Ch 5 for some initial pointers, you can also access a short style guide here and a more detailed, tidyverse specific style guide here.
So far, we have inputed all of our code directly into the console. If you scroll up in the console you will find that all the commands and results from your current R session are still in the console. Using Cmd/Ctrl + L will clear the entire console.
Uh-oh - what if we need to go back over the code we just cleared?
Well, for one if you check the History tab in the top right panel you will see that all your commands have been recorded. If you highlight one of them and either click on To Console or hit Enter it will send it directly to the console.
Usually your history will be saved automatically when you close R/end an R session (unless you have changed the settings) and it will be restored when you open R again. You can use the broom icon to clear your entire history.
Uh-oh - now what do we do?
In general, you should only be typing code directly into the console for quick queries or troubleshooting but since usually we want to be able to revisit and share our work you will want to be able to save your work in an R script (*.R) or include it in a quarto document (*.qmd) as a code chunk. For this course we will mostly be operating with quarto files (more on that in the next chapter).
You can open a new R script using Ctrl + Shift + C or using File > New File > R Script. This will open an R script in a new tab in the top left pane.
Save your R script using Cmd/Ctrl + S or File > Save As - this will open a dialogue box for you to save your R script with the file extension .R.
Ctrl + Enter will execute commands directly from the script editor by sending them through to the console. You can use this to run the line of code your cursor is currently in in the script editor or you can highlight a series of lines to execute. You can also run all the code in a script by clicking on the Run button.
Create a new R script to keep track of the rest of the things we will learn today.
You can add comments to your R scripts using #. Essentially, once you type an # in a line anything to the right of it will be ignored.
This is really helpful as it will allow you to comment your script, i.e. you can leave notes and explanations as to what your code is doing for future you and for other collaborators. This is especially helpful if you come back to some of your code after a period of time, if you are sharing your code with others, and when you are debugging code. You will find that as you become more experienced your comments will become shorter and more concise and you might even be tempted to leave them out completely - don’t4!
4 To help you build a habit of good commenting practice, commenting your code is a requirement for your homework assignment and skills tests.
For example you might find a comment like this more helpful at the moment:
# assign value to new object total length
fork_length <- 436But soon you’ll find this just as helpful:
# total length fish
fork_length <- 436FL <- 436 # total length fishYou can comment/uncomment multiple lines at once by highlighting the lines you want to comment (or uncomment) and hitting Ctrl + Shift + C. This can be useful if you are playing around with code and don’t want to delete something but don’t want it to be run either.
When we installed R packages earlier we mentioned that they are sets of predefined functions. These are essentially mini-scripts that automate using specific sets of commands. So instead of having to run multiple lines of code (this can be 10s - 100s of lines code) you call the function instead.
Each function usually requires multiple inputs (arguments) and once executed return a value (though this is not always the case).
For example the function round() can be used to round a number5.
5 This is an excellent example of naming things well!
fork_length_cm <- round(34.8821)If we print the value of our object we see the following value is returned.
fork_length_cm[1] 35
For this function the input (argument) is a number and the returned value is also a number. This is not always the case, arguments can be numbers, objects, file paths …
Many functions have set of arguments that alter the way a function operates - these are called options. Generally, they have a default value which are used unless specified otherwise by the user.
You can determine the arguments as function by calling the function args().
args(round)function (x, digits = 0)
NULL
Or you can call up the help page using ?round or by typing it into the search box in the help tab in the lower right panel.
For example, our round() function has an argument called digits, we can use this to specify the number of significant digits we want our rounded value to have.
round(34.8821, digits = 2)[1] 34.88
If you provide the arguments in the exact same order as they are defined you do not have to specify them.
round(34.8821, 2)[1] 34.88
However, if you specify the arguments, you can switch their order.
round(digits = 2, x = 34.8821)[1] 34.88
Good code style is to put the non-optional arguments (frequently the object, file path or value you are using) first and then specify the names of all the optional arguments you are specifying. This provides clarity and makes it easier for yourself and others to follow your code.
Occasionally you might even want to use comments to further specify what each argument is doing or why you are choosing a specific option.
round(34.8821, # number to round
digits = 2) # specify number of significant digits[1] 34.88
Now that we’ve figured out what objects and functions are let’s get to know the two data types we will be spending the most time with this semester - vectors and data frames (data.frame)6.
6 Other data types include lists (list), factors (factor) matrices (matrix), and arrays (array); we’ll introduce those later on.
The most simple data type in R is the (atomic) vector which is a linear vector of a single type. There are six main types -
character: strings or words.numeric or double: numbers.integer: integer numbers (usually indicated as 2L to distinguish from numeric).logical: TRUE or FALSE (i.e. boolean data type).complex: complex numbers with real and imaginary parts (we’ll leave it at that).raw: bitstreams (we won’t use those either).You can check the data type of any object using class().
class(fork_length)[1] "numeric"
Currently, our fork_length object consists of a single value. The function c() (concatenate) will allow us to assign a series of values to an object.
fork_length <- c(454, 234, 948, 201)
fork_length[1] 454 234 948 201
We call the same function to create a character vector.
sharks <- c("bullshark", "blacktip", "scallopedhammerhead")
class(sharks)[1] "character"
The quotes around "bullshark" etc. are essential because they indicate that this is a character.
If we do not use quotes, R will assume that we are trying to call an object and you will get an error code along the lines of “! object 'bullshark' not found”.
You can use c() to combine an existing object with additional elements (assuming they are the same data type).
species <- c(sharks, "gafftop")
species[1] "bullshark" "blacktip" "scallopedhammerhead"
[4] "gafftop"
Next to class() there are other helpful functions to inspect the content of a vector. For example length() will tell you how many elements are in a particular vector.
length(fork_length)[1] 4
The function str() will give you an overview of the structure of any object and its elements.
str(fork_length) num [1:4] 454 234 948 201
Recall, that an atomic vector is a linear vector of a single type. Let’s explore what that means by taking a look at what happens if we create atomic vectors where we mix the data types.
numeric_character <- c(1, 2, 3, "a")
numeric_logical <- c(1, 2, 3, TRUE)
character_logical <- c("a", "b", "c", TRUE)
wtf <- c(1, 2, 3, "4")We already discovered that we can combine vectors - but can we extract certain components from vectors? Indeed, there are a variety of ways that we can subset vectors.
The most simple way is using square brackets to indicate which element (or elements) we can’t extract. In R, indices start at 1.7
7 This is not the case for all programming languages, e.g. Perl, Python, or C++ start with 0.
# extract second element
species[2][1] "blacktip"
# extract fourth and second element
species[c(4, 2)][1] "gafftop" "blacktip"
You can also repeat indices to create a new object with additional elements.
species_longer <- species[c(2, 2, 4, 3, 4, 4, 1, 1)]
species_longer[1] "blacktip" "blacktip" "gafftop"
[4] "scallopedhammerhead" "gafftop" "gafftop"
[7] "bullshark" "bullshark"
More frequently, we will want to extract certain elements based on a specific condition (conditional subsetting).
This is done using a logical vector, here TRUE select the element with the same index and FALSE will not.
fork_length <- c(454, 234, 948, 201)
# use logical vector to subset
fork_length[c(TRUE, FALSE, TRUE, FALSE)][1] 454 948
This seems like a very impractical option. However, normally we would not create the logical vector by hand as we have done here, rather it will be the output of a function or logical test. For example, we might want to identify fish with a fork length > 300mm.
# identify fish with fork length > threshold
fork_length > 300[1] TRUE FALSE TRUE FALSE
This creates an output the same length as the vector we looked at (fork_length) consisting of TRUE/FALSE statements for each element by comparing the each element of the vector to the condition (>300) and determining if the condition is met (the statement is true) or not.
Instead of first creating a vector of TRUE/FALSE statements can use this condition to subset our vector directly.
# identify true/false of fish with fork length > threshold
fork_length[c(fork_length > 300)][1] 454 948
There are a series of boolean expressions8 we can use for subsetting vectors.
8 Boolean expressions are logical statements that are either true or false; most of them you are probably already familiar with because math
> and < (greater/less than)=> and =< (equal to or greater/less than)== (equal to) and != (is not equal to)You can combine to boolean expressions using &, (both conditions must be met) and | (at least one condition must be met).
R is set apart from other programming languages because it was designed to analyze data9 it has straightforward ways to deal with missing data (NA or na values) because those are quite common in real world data sets.
9 some people will argue that it is a ‘statistical language’ rather than a true programming language … don’t listen to them, they are just jealous of your R skillz!
Let’s create a vector with a missing value.
total_length <- c(560, NA, 1021, 250)Let’s say we want to calculate the mean value.
mean(total_length)[1] NA
Most functions will return NA when doing operations on objects with missing values. As such, many functions include an argument to omit missing values.
mean(total_length, na.rm = TRUE)[1] 610.3333
Other functions that are helpful to deal with missing values are is.na(), na.omit(), and complete.cases().
Recall that atomic vectors are linear vectors of a simple type, essentially they are one dimensional. Frequently we will be using data frames (data.frame) which you can think of as consisting of several vectors of the same length where each vector becomes a column and the elements are the rows.
Let’s create a new object that is a dataframe with three columns containing information on species, fork length, and total length.
# combine vectors into data frame
catch <- data.frame(species, fork_length, total_length)You should now see a new object in your Global Environment and you will now also see that there are two categories of objects Data and Values. You will see that the data.frame is described as having 4 obs (observations, those are your rows) of 3 variables (those are your columns). If you click on the little blue arrow it will give you additional information on each column - note that because each column is essentially a vector, each one must consist of a single data type which is also indicated.
Calling the str() will give you the same information.
str(catch)'data.frame': 4 obs. of 3 variables:
$ species : chr "bullshark" "blacktip" "scallopedhammerhead" "gafftop"
$ fork_length : num 454 234 948 201
$ total_length: num 560 NA 1021 250
You can further inspect the data.frame by clicking on the little white box on the right which will open a tab in the top left panel next to your R script. You can also always view a data.frame by calling the View() function.
View(catch)This can be a helpful way to explore your data.frame, for example, clicking on the headers will sort the data frame by that column. Usually we won’t build or data.frames by hand, rather we will read them in from e.g. a tab-delimited text file - but more on that later.